Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement new pganalyze.explain_analyze() helper #655

Merged
merged 3 commits into from
Jan 6, 2025

Conversation

lfittl
Copy link
Member

@lfittl lfittl commented Dec 19, 2024

This executes on-demand query runs for the Query Tuning feature via
a helper function that prevents reading the table data directly, as well
as granting the collector user unnecessary permissions.

This is now the only way that query runs can execute EXPLAIN ANALYZE,
direct execution without the helper is intentionally not supported.

@lfittl lfittl requested a review from a team December 19, 2024 02:57
@keiko713
Copy link
Contributor

I think overall this is looking good (I admit that I haven't checked the SQL part).
Are we going to retire the enable_query_runner flag in the different PR? Also, we will (well, the app will) want to know if the helper function exists or not, so it would be great if it's reported as a collector config.

@lfittl
Copy link
Member Author

lfittl commented Dec 19, 2024

Are we going to retire the enable_query_runner flag in the different PR? Also, we will (well, the app will) want to know if the helper function exists or not, so it would be great if it's reported as a collector config.

Yeah, I was going to remove the flag in a different PR.

And good point re: the helper - though technically we can also get that via the schema function data that already gets reported.

The main reason to stay with that existing flow that I could see (vs also including it in collector info), is that this is a per-database helper, like the Index Advisor helpers. Storing this information in collector info for many databases might not be ideal?

@keiko713
Copy link
Contributor

The main reason to stay with that existing flow that I could see (vs also including it in collector info), is that this is a per-database helper, like the Index Advisor helpers. Storing this information in collector info for many databases might not be ideal?

Ah, interesting, so you'll need to create a helper function to all databases (if you have many databases). Yeah in that case, sending out in the collector info doesn't make sense.
In that case, we may want to consider adding the function to https://github.com/pganalyze/collector/blob/main/runner/generate_helper_sql.go or providing some similar things for this.

I think I can workaround by checking the function definition on the app side 👍

@lfittl lfittl force-pushed the explain-function-helper branch 2 times, most recently from 0771dbe to c61e0e4 Compare December 24, 2024 04:52
TEST_DATABASE_URL: postgresql://postgres:postgres@localhost:5432/postgres?sslmode=disable
services:
postgres:
image: postgres:14
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using 14 as a minimum here since the EXPLAIN ANALYZE tests rely on being able to turn off compute_query_id in the output (and that was added in 14).

@lfittl
Copy link
Member Author

lfittl commented Dec 24, 2024

@keiko713 @seanlinsley FYI, this is ready for re-review. I ended up adding both parameter support and Postgres setting support here as well, since it was quick enough, and even if the server doesn't use it that works fine.

protobuf/server_message.proto Outdated Show resolved Hide resolved
util/sql_helpers.go Show resolved Hide resolved
input/postgres/explain_analyze.go Outdated Show resolved Hide resolved
input/postgres/explain_analyze.go Outdated Show resolved Hide resolved
input/postgres/explain_analyze.go Outdated Show resolved Hide resolved
main.go Outdated Show resolved Hide resolved
util/explain_analyze_helper.sql Outdated Show resolved Hide resolved
util/explain_analyze_helper.sql Outdated Show resolved Hide resolved
runner/query_run.go Show resolved Hide resolved
main.go Outdated Show resolved Hide resolved
@lfittl lfittl force-pushed the explain-function-helper branch 4 times, most recently from dc15ca0 to f192625 Compare December 28, 2024 00:08
@lfittl
Copy link
Member Author

lfittl commented Dec 28, 2024

@seanlinsley @keiko713 This is ready for re-review, all feedback should be addressed.

runner/websocket.go Outdated Show resolved Hide resolved
runner/query_run.go Show resolved Hide resolved
@@ -130,6 +130,34 @@ func Run(ctx context.Context, wg *sync.WaitGroup, globalCollectionOpts state.Col
return
}

if globalCollectionOpts.GenerateExplainAnalyzeHelperSql != "" {
wg.Add(1)
testRunSuccess = make(chan bool)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't have to be part of this PR, but we may want to add test run output for the presence of the helper function and success on running a test explain.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a good point. I'll add this in a follow-up PR.

PERFORM 1 FROM pg_roles WHERE (rolname = current_user AND rolsuper) OR (pg_has_role(oid, 'MEMBER') AND rolname IN ('rds_superuser', 'azure_pg_admin', 'cloudsqlsuperuser'));
IF FOUND THEN
RAISE EXCEPTION 'cannot run: helper is owned by superuser - recreate function with lesser privileged user';
END IF;
Copy link
Member

@seanlinsley seanlinsley Dec 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error occurs in local testing because my Postgres user is a superuser. Maybe the OR should be an AND instead?

For reference, these are the values for the different where clauses:

select rolsuper, pg_has_role(oid, 'MEMBER'), rolname IN ('rds_superuser', 'azure_pg_admin', 'cloudsqlsuperuser') from pg_roles where rolname = current_user;
 rolsuper | pg_has_role | ?column? 
----------+-------------+----------
 t        | t           | f

If the logic in this check is intentional, what should I do to my local Postgres install to work around it, and how should we communicate this issue in the docs?

Copy link
Member Author

@lfittl lfittl Dec 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you follow the instructions on setting up the separate user (preview) it should work as expected, even in a local setup. Its intentionally an OR, since we want to block users that run Postgres on a VM to use a real superuser for this.

runner/query_run.go Outdated Show resolved Hide resolved
@seanlinsley
Copy link
Member

This looks good overall, but I did run into one bug: the statement timeout doesn't seem to take effect. For example if I run a SELECT pg_sleep(90) query through the system, the collector responds back with JSON including "Execution Time": 90001.987.

@lfittl
Copy link
Member Author

lfittl commented Jan 2, 2025

This looks good overall, but I did run into one bug: the statement timeout doesn't seem to take effect. For example if I run a SELECT pg_sleep(90) query through the system, the collector responds back with JSON including "Execution Time": 90001.987.

Strange - I had tested that and it worked for me, but I can recheck tomorrow. It sounds like you got an actual result vs the EXPLAIN without ANALYZE that should happen when the query times out.

Copy link
Contributor

@msakrejda msakrejda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow the username usage in the helper generation script. Otherwise, looks good.

input/postgres/explain_analyze.go Outdated Show resolved Hide resolved
runner/generate_helper_sql.go Outdated Show resolved Hide resolved
Some time ago (at least 9+ years) we introduced a maximum connection
lifetime of 30 seconds via the SetConnMaxLifetime database/sql function.

This was likely intended as an extra safety measure in case the collector
didn't clean up its DB handle correctly. We've had a few minor issues
with incorrect cleanups over the years (though usually not with the DB
connection), but at this point our connection cleanup should be reliable.

The connection lifetime causes an issue because it causes Go to
reconnect to the database after 30 seconds have elapsed. This then
effectively removes any connection local settings, in particular
a statement_timeout.

Effectively this meant that queries may unintentionally not have been
subject to the statement_timeout, if the same connection previously took
30 seconds to do its work. To resolve, remove the max lifetime.
lfittl added 2 commits January 6, 2025 14:42
This executes on-demand query runs for the Query Tuning feature via
a helper function that prevents reading the table data directly, as well
as granting the collector user unnecessary permissions.

This is now the only way that query runs can execute EXPLAIN ANALYZE,
direct execution without the helper is intentionally not supported.

In passing, expand the protobufs to pass down parameters and parameter
types and optional planner settings, and refactor the query run code to
handle work requiring a DB connection in a separate Go function.

Further, this adds tests that utilize a new TEST_DATABASE_URL enviroment
setting, which is used to run unit tests that require a database
connection (previously only the integration tests used the database).
Like "--generate-stats-helper-sql" this allows generating a SQL script
that installs the pganalyze.explain_analyze helper into all databases on
a server, as specified by the server section name passed as an argument.

Additionally, the "--generate-helper-explain-analyze-role" option allows
setting the role that the helper function should be owned by as a value,
defaulting to "pganalyze_explain".

Note that the role creation, as well as any GRANT statements for the role
must be taken care of separately.
@lfittl lfittl force-pushed the explain-function-helper branch from 5e7d30f to bd904f4 Compare January 6, 2025 22:44
@lfittl lfittl merged commit 7e338b0 into main Jan 6, 2025
3 checks passed
@lfittl lfittl deleted the explain-function-helper branch January 6, 2025 23:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants